Wavelet optimization for applying continuous wavelet transform to maternal electrocardiogram component enhancing
Yu Qiong1, Guan Qun2, Li Ping2, Liu Tie-Bing2, Si Jun-Feng1, Zhao Ying1, Liu Hong-Xing1, †, Wang Yuan-Qing1
School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China
Nanjing General Hospital of Nanjing Military Command, Nanjing 210002, China

 

† Corresponding author. E-mail: njhxliu@nju.edu.cn

Project supported by the National Natural Science Foundation of China (Grant No. 61271079)

Abstract

In the procedure of non-invasive fetal electrocardiogram (ECG) extraction, high-quality maternal R wave peak detection demands enhancing the maternal ECG component firstly. Among all the enhancing algorithms, the one based on the continuous wavelet transform (CWT) is very important and its effectiveness depends on the optimization of the used wavelet. However, up to now, there is still no clear conclusion on the optimal wavelet (including type and scale) for CWT to enhance the maternal ECG component of an abdominal ECG signal. To solve this problem, in this paper, we select several common used types of wavelets to carry out our research on what the optimal wavelets are. We first establish big-enough training datasets with different sampling rates and make a maternal QRS template for each signal in the training datasets. Second, for each type of selected wavelets, we find its optimal scale corresponding to each QRS template in a training dataset based on the principle of maximal correlation. Then calculating the average of all optimized wavelet scales results in the mean optimal wavelet of this type for the dataset. We use two original abdominal ECG databases to train and test the optimized mean optimal wavelets. The test results show that, as a whole, the mean optimal wavelets obtained are superior to the wavelets used in other publications for applying CWT to maternal ECG component enhancing.

PACS: 87.85.Ng
1. Introduction

Almost all promising non-invasive fetal electrocardiogram (ECG) extraction methods[18] involve accurately determining maternal ECG R-wave peaks. The high-quality maternal R wave peak detection demands enhancing the maternal ECG component firstly. Researchers have proposed a variety of ECG enhancing algorithms,[917] such as the difference method,[9] the Hilbert transform method,[10] the template matching method,[11] the wavelet transform method,[1217] and so on. Among all the enhancing methods, the wavelet transform method is referred to as a very important one.

Specifically, there are two ways for the wavelet transform method to enhance the maternal ECG component: one[1214] is to compute the continuous wavelet transform coefficients of an abdominal ECG signal based on a selected wavelet type and a certain scale and then take the modulus of the wavelet transform coefficients as the enhanced signal; the other[1517] is to compute the wavelet transform coefficients of the abdominal ECG signal based on multiple scales, then take some coefficients on different scales to reconstruct an enhanced ECG signal with larger SNR. Both enhancing methods are affected by the wavelet type and the scale selected, our research will only focus on the wavelet optimization in the first enhancing method above.

Nowadays, when applying continuous wavelet transform (CWT) to enhance the maternal ECG component, most researchers[1214,18] usually determine the mother wavelet and scale through their experiences or consulting other publications, but do not carry out deep researches about the wavelet optimization. Fard et al.[19] designed a new hybrid wavelet named HCW by combining the complex frequency B-spline wavelet (fbsp) and complex morlet (cmor) wavelet linearly, and proposed to determine the parameters and combination coefficients of the hybrid wavelet by the theory of template matching. However, they did not introduce the parameter determination method in detail and clearly, for instance, they did not explain what kind of standard ECG template database has been taken, and what kind of objective function has been used for the parameter optimization. Their main contribution is raising a new type of wavelet for ECG enhancing, rather than paying enough attention to optimize the parameters.

In this paper, we will first establish two training datasets from two original abdominal ECG databases with different sampling frequencies and make a maternal QRS template for each signal in the training datasets. Then we will select several commonly used types of wavelets,[1221] including fbsp,[19] cmor,[19] Mexican hat wavelet (Mexh),[12,14] Daubechies wavelet (db),[17] Biorthogonal wavelet (bior),[20] and symlet wavelet (sym),[21] to carry out our research. For each type of wavelet, we will find one optimal scale for each QRS template of a training dataset based on the principle of maximal correlation and then work out a mean optimal wavelet for the whole training dataset. The mean optimal wavelets obtained through big databases should be more suitable for maternal ECG component enhancing statistically. Finally, we will use the two test datasets from the original abdominal ECG databases to test them.

The paper is arranged as follows. In Section 2, we will introduce the theory of CWT, the wavelets of fbsp, cmor, Mexh, db, bior, sym, and the two original abdominal ECG databases. In Section 3, we will describe our proposed wavelet optimization algorithm and test method thoroughly. In Section 4, we will present the mean optimal wavelet for each type and the comparative test results between the optimized wavelets and the wavelets used in other publications. Sections 5 and 6 present our discussion and conclusion.

2. Materials
2.1. Continuous wavelet transform and common wavelets for ECG enhancing

The CWT of a signal x(t) is defined by[14] where ψ(t) is the complex conjugate of the mother wavelet function ψ(t), a is the dilation parameter of the wavelet, called the scale, and b is the location parameter of the wavelet.[14] More details about the wavelet theory can be seen in some previous publications.[22,23]

Several types of wavelets, fbsp, cmor, Mexh, db, bior, and sym, which are commonly used for ECG enhancing in other publications, are introduced as follows.

The fbsp function is defined as[19] The cmor function is[19] where m is an integer order parameter, fb is a bandwidth parameter, and fc is the wavelet center frequency.[19] The Mexh function is where .

The other three wavelets are different from the above three. They have no explicit mathematical equations and their wavelet functions ψ(t) are related with scaling functions ϕ(t) through the two-scale difference equation In the frequency domain, the equation is where h(n) and g(n) are the filtering coefficients; and H(w) and G(w) are their Fourier transforming results. People usually use the theory of iteration to obtain ψ(t) and ϕ(t) with Eq. (6). In Matlab, wavefun.m has implemented the algorithm and we can use it to obtain the wavelets' waveforms.

The waveforms of the six selected wavelets are shown in Fig. 1.

Fig. 1. (color online) Waveforms of six selected wavelets: (a) fbsp1-2-1.5, (b) cmor1-1.5, (c) Mexh, (d) db4, (e) bior1.5, and (f) sym8.
2.2. Original abdominal ECG databases

Two abdominal ECG databases are used to build QRS templates and test the optimized wavelets in this paper. They are introduced as follows.

Database 1 (collected by ourselves) The authors collected actual ECG data from a hospital. The methods of collecting the data can be found in the literature.[24] We conducted the collecting experiment using 78 pregnant women in the 38th–40th week of gestation in the Obstetrics Branch of Nanjing General Hospital of Nanjing Military Command. The data collecting system was composed of a standard 12-lead ECG machine and a personal computer (PC). The standard 12-lead ECG machine, made by Ni-hon Kohden Corporation with the model No. 1350p, was connected to the PC with a dedicated USB cable. The ECG machine acted as a data acquisition module. The parameters of this machine must be preset; the sampling frequency was set to 500 Hz, the cut-off frequency of the anti-aliasing filter was set to 75 Hz, and the switches for baseline drift suppression and EMG interference suppression were turned on. As for the placement of electrodes, we designed three placement schemes for ten electrodes and the details can be found in our previous publication.[24]

The experimental protocol is described as follows. (i) Let the woman lie down on a bed, clean her abdominal skin with alcohol, and apply saline water at the positions where the electrodes would be pasted to remove the grease on the skin and enhance the conductivity of the electrodes. (ii) Place the ten electrodes according to the first placement scheme, then record 1–2 groups of data with a duration of 24 s. (iii) Adjust the positions of the electrodes according to the second placement scheme and record 1–2 groups of data with a duration of 24 s. (iv) Adjust the positions of the electrodes according to the third placement scheme and record 1–2 groups of data with a duration of 24 s. For every pregnant woman, we collected 3–6 groups of data.

Finally, we totally collected 343 groups of data, and every group of data was comprised of 8-channel abdominal ECG signals with a duration of 24 s.

Database 2 (challenge data) The challenge data were gathered from the Physionet/Computing in Cardiology Challenge 2013. The datasets used for the challenge were obtained from five different sources (including real and simulated data), yielding a total of 447 records. All records were formatted to have a 1 kHz sampling frequency, one-minute duration, and four channels of non-invasive abdominal maternal ECG leads. The records were divided into three datasets for the challenge: set A (75 records, both records and reference for FQRS locations were made public), set B (100 records, only the records were made public), and set C (272 records, both records and reference for FQRS locations were withheld from the public). More detailed information about the database can be found in Ref. [25]. In our test, we only used the public datasets A and B for the test, i.e., 175 records in all.

For each original database, we will establish a training dataset for the wavelet optimization and a test dataset to test the optimization results respectively due to their different sampling frequencies.

3. Methods
3.1. Establishment of training and test datasets

Before establishing the training datasets, first, thinking of there being less leads in Database 2, for each multi-channel signal in it, we use the formulas new_sig = sig(i) − sig(j) and new_sig = sig(i) + sig(j) to increase the number of leads, where sig(i) and sig(j) are sub-signals of the multi-channel signals, i > j. Second, to obtain high-quality maternal QRS templates, we select all the effective signals from both abdominal ECG databases, excluding the signals seriously polluted by disturbance and noise. Figure 2 shows a selected effective signal and an excluded one.

Fig. 2. (color online)(a) A signal selected for training datasets. (b) A signal excluded.

The selection criterion is based on the template matching method. The details are as follows: first, carry out the initial maternal R wave peak detection on each signal[5] and obtain the corresponding QRS complex template whose duration is set to 0.12 s (this value is selected because the duration of QRS complex is usually less than 0.12 s);[26] second, calculate the correlation coefficients between each detected maternal R wave and the template, then compute the number of the coefficients higher than 0.8, which mean real or high-quality QRS waves based on the experiences of our previous research (in this step, we also record the positions of the maternal R wave peaks whose coefficients are higher than 0.8.); third, if the number is above 10 for Database 1 composed of short-duration data (24 s), or above 30 for Database 2 composed of long-duration data (60 s), the corresponding lead is selected to ensure both the quality and the quantity of signals in the training datasets. Finally we select 2649 leads of effective abdominal signals from Database 1 and 2655 leads from the extended Database 2 to build up the final training sets.

The establishment of the test datasets is much easier than that of the training datasets. First, extend Database 2 as above, and then select the signals whose maternal R wave peaks can be marked correctly no matter by an automatic algorithm[5] or by hand. Three authors from the hospital are experienced in reading ECG waveforms, and they guarantee that all of the R peaks marked by eye are reliable.

3.2. Optimization method

Before the optimization, considering that the wavelets may have more than one parameter, firstly we set the other parameters of the wavelets to fixed values and only let the scale be optimized. Since we need to compare our optimization results with the wavelets used in other publications, we select common used wavelets fbsp1-2-1.5,[19] cmor1-1.5,[19] Mexh,[12,14] db4,[17] bior1.5,[20] sym8[21] for optimization. Their waveforms can be seen in Fig. 1.

For a certain kind of wavelet, we will optimize it on each training dataset respectively. The optimization steps are as follows.

(i) Make a maternal QRS template for each signal in the training dataset. First, a pre-filtering method is applied to the signal, see Ref. [5]; second, take the positions of the maternal R wave peaks recorded in the establishment of the training dataset; third, take 0.12 s-length windowed signals whose centers are the detected R peaks and average them to obtain a 0.12 s-length QRS template.

(ii) Take out a QRS template and calculate the correlation coefficients between the wavelets under different scales and the template; the scale corresponding to the maximal correlation coefficient is the optimal scale for this template.

(iii) After taking out all the QRS templates and obtaining their optimal scale respectively as above, average all the optimal scales to generate the mean optimal wavelet.

At the second step above, according to our rough estimation, the maximal value of the optimal scales of the wavelets we used must be lower than 100 for 500 Hz (sampling frequency) data, so we set the searching range of scales for them to 0.5:0.5:100; and for the sampling frequency 1000 Hz, the range is set to 1:1:200 correspondingly. Besides, when the scale is a, the support interval of the wavelet will be a times as when the scale is 1, for example, [−a × 20, a × 20] for fbsp1-2-1.5. As for the sampling frequency of the wavelet, after a thorough study, we set it to 1 Hz, which is the same with cwt.m in Matlab.

3.3. Test method

To compare our mean optimal wavelets with the wavelets used in the literature, we use two indices of the enhanced signals obtained by CWT to evaluate them.

The first index is defined through the following steps. First, mark the correct maternal R peaks no matter whether by an automatic algorithm[5] or by hand; second, mark the prominent disturbances which may increase difficulties in R wave detection automatically using a very simple peak detector. One example is shown in Fig. 3; figure 3(a) is the original signal, and figure 3(b) is the enhanced signal after performing CWT with fbsp1-2-1.5; the red circles indicate the marked maternal R peaks, while the green circles indicate the marked prominent disturbances.

Fig. 3. (color online) Signal enhancement and peak marking. (a) Original signal with maternal R peaks marked. (b) Enhanced signal with maternal R peaks and disturbances marked.

The peak detector consists of two main parts: detect all peaks initially based on max search, and delete the peaks which are lower than the threshold. Its pseudo codes are presented in Appendix A.

In addition, after performing the peak detection, we should delete the maternal R wave peaks from the results by comparing them with the previous detected maternal R wave peaks. The details are as follows. If the absolute time difference between the detected peak and one maternal R wave peak marked before (confirming with eyes) is lower than a threshold 60 ms, we believe that this peak is corresponding to the maternal R wave and then delete it. The threshold 60 ms is determined by training in advance. Then, the prominent disturbances can be obtained.

After the maternal R wave peaks and disturbances are detected, the first index is defined as follows: where sAnoise is the sum of the amplitudes of the disturbances marked by the green circles, and sApeak is the sum of the amplitudes of the maternal R peaks. The new index is named as the noise-to-signal ratio (NSR). The lower NSR means the more successful maternal component enhancing. For the case that no disturbance on the enhanced signal has been detected, the NSR will be 0.

The second index is F1[5] and its definition is where TP is the number of correctly detected maternal R-wave peaks,[5] FP is the number of extra falsely detected maternal R-wave peaks,[5] and FN is the number of missed maternal R-wave peaks of all groups of data used.[5] We use this index to evaluate the detection results of the peak detector above on the enhanced signals obtained by CWT. The higher F1 means the more successful maternal component enhancing.

4. Results
4.1. Optimization results

Before showing the optimization results, one detail we should emphasize is that, the optimal scale we obtain directly above is related to the sampling frequency, which is a relative optimal scale. To achieve an optimal scale without the influence of the sampling frequency, we define the absolute optimal scale as K = scale/fs.

After using the six wavelets to implement the optimization experiments on the two training databases, we obtain the optimization results, which are shown in Table 1.

Table 1.

Our optimization results for the six wavelets based on the two training databases.

.
4.2. Test results

The obtained mean optimal wavelets of the six selected wavelets (fbsp1-2-1.5, cmor1-1.5, mexh, db4, bior1.5, sym8) and the corresponding wavelets used in the literature are tested on the two test databases. Before the test, through a literature review, we obtain the K's values of the five wavelets fbsp1-2-1.5,[19] cmor1-1.5,[19] db4,[17] bior1.5,[20] and mexh[12,14] used in other publications and show them in Table 2.

Table 2.

Absolute optimal scale K for five wavelets obtained in the literature.

.

In the test, when performing the peak detection, we should keep two parameters dilation and thrhd unchanged; dilation = 0.1, thrhd = 0.4. For Database 1, Loop = 10; for Database 2, Loop = 25.

For each test database, firstly, we calculate the NSR for each enhanced signal of the test dataset after performing CWT and then take the sum of the NSRs of all data as the quality indicator of the wavelet. The test results for the sum of the NSRs are shown in Table 3.

Table 3.

Test results for sum of the NSRs with Database 1 and Database 2.

.

Secondly, for each wavelet, F1 is obtained by first computing the TP, FP, and FN of each test dataset, and then calculating the parameter according to Eq. (8). Finally, the obtained F1 are shown in Table 4.

Table 4.

Test results for F1 with Database 1 and Database 2.

.

From the results, we can find that the effectivenesses of three wavelets fbsp1-2-1.5, bior1.5, Mexh for maternal ECG component enhancing are similar and relatively poorer; sym8, db4, and especially cmor1-1.5 are much better.

5. Discussion

By comparing our optimization results with the wavelets in other publications, we can also find that the optimization proposed in our manuscript is effective. Besides, in the whole method, the training datasets establishment and wavelet optimization are automatic and quick, which have made the experiment easy to carry out. Even if the test datasets are established and the test needs people's assistant to help confirm the maternal R wave peaks, it is very simple and easy to do in general.

The first database has a very limited gestation week range (38–40) while the second is not. However, the experimental results have shown that the optimal wavelets have nothing to do with that. The reason may be that the optimal wavelets we obtained are for maternal component enhancing.

Though we only used six types of wavelets as examples, the optimization and the test method for optimal wavelets can be applied to any kind of wavelets and supply more information about the optimal wavelet. Although our experiments are carried on the abdominal ECG databases of pregnant women, the optimization and test method can also be applied to ECG databases of other people. This work will be done in the future.

In the establishment of the training datasets, we have used the template matching method to estimate the quality of the signals. In our opinion, this method is very promising and deserves more attention.

The authors also think that it is significant to introduce the wavelet scale related index K in our paper, since the obtained mean optimal scale described by K has excluded the factor of sampling rate. In practice, it is easy for us to multiply the obtained optimal K with the actual signal sampling frequency to obtain the wavelet scale for use.

6. Conclusion

We obtain the mean optimal wavelets which are superior to the ones in the other publications when used for maternal ECG component enhancing. As for the wavelet types, the effectivenesses of three wavelets fbsp1-2-1.5, bior1.5, Mexh for maternal ECG component enhancing are similar and relatively poorer; while sym8, db4 and especially cmor1-1.5 are much better.

It is believed that the optimized wavelets in this paper can be used when applying CWT to maternal ECG component enhancing. At the same time, the proposed method to optimize and evaluate wavelets is helpful for similar research.

Reference
[1] Kuzilek J Lhotska L 2013 Computing in Cardiology Conference IEEE 40 177
[2] Perlman O Katz A Zigel Y 2013 Computing in Cardiology Conference IEEE 40 169
[3] Maria C D Duan W Bojarnejad M et al. 2013 Computing in Cardiology Conference IEEE 40 305
[4] Andreotti F Riedl M Himmelsbach T Wedekind D Wessel N Stepan H Schmieder C Jank A Malberg H Zaunseder S 2014 Physiol. Meas. 35 1551
[5] Behar J Oster J Clifford G D 2014 Physiol. Meas. 35 1569
[6] Varanini M Tartarisco G Billeci L Macerata A Pioggia G Balocchi R 2014 Physiol. Meas. 35 1607
[7] Varaninia M Tartariscob G Balocchia R Maceratac A Pioggiab G Billeci L 2017 Comput. Biol. Med. 85 125
[8] Gupta P Sharma K K Joshi S D 2016 Computers in Biology & Medicine 68 121
[9] Pan J Tompkins W J 1985 IEEE Trans. Biomed. Eng. 32 230
[10] Nygards M Sommo L 1983 Med. Biol. Eng. Comput. 21 538
[11] Krasteva V Jekova I 2007 Ann. Biomed. Eng. 35 2065
[12] Qi H B Liu X F Pan C 2010 IEEE ICICTA 1 22
[13] Karvounis E C Papaloukas C Fotiadis D I Michalis L K 2004 Computers in Cardiology IEEE 31 737
[14] Legarreta I R Addison P S Grubb N Clegg G R 2003 Computers in Cardiology IEEE 30 565
[15] Banerjee S Gupta R Mitra M 2012 Measurement 45 474
[16] Behbahani S Dabanloo N J 2012 Computing in Cardiology IEEE 38 805
[17] Mehta P Kumari M 2012 Int. Journal of applied Engineering Research 7 11
[18] Abdelliche F Charef A 2009 International Conference on Electrical and Electronics Engineering IEEE II 267 10.1109/ELECO.2009.5355331
[19] Fard P J Moradi M H Tajvidi M R 2008 International Journal of Cardiology 124 250
[20] Wu S Shen Y Zhou Z Lin L Zeng Y Gao X 2013 Comput. Biol. Med. 43 1622
[21] Acharyya A Maharatna K Al-Hashimi B M Mondal S 2010 Conf. Proc. IEEE Eng. Med. Biol. Soc. 10 1142
[22] Daubechies I 1990 IEEE Transactions on Information Theory 36 961
[23] Mallat S G 1989 IEEE Transactions on Pattern Analysis & Machine Intelligence 11 674
[24] Zhang J M Huang X L Guan Q Liu T B Li P Zhao Y Liu H X 2015 Chin. Phys. 24 442
[25] Clifford G D Silva I Behar J Moody G B 2014 Physiol Meas. 35 1521
[26] Nagendra H Mukherjee S Kumar V 2011 International Journal of Engineering Science & Technology 3